Survey on MapReduce Scheduling Algorithms
نویسندگان
چکیده
MapReduce is a programming model used by Google to process large amount of data in a distributed computing environment. It is usually used to perform distributed computing on clusters of computers. Computational processing of data stored on either a file system or a database usually occurs. MapReduce takes the advantage of locality of data, processing data on or near the storage areas, thereby avoiding unnecessary data transmission. The simplicity of the programming model and the automatic handling of node failures hiding the complexity of fault tolerance make MapReduce to be used for both commercial and scientific applications. As MapReduce clusters have become popular these days, their scheduling is one of the important factor which is to be considered. In order to achieve good performance a MapReduce scheduler must avoid unnecessary data transmission. Hence different scheduling algorithms for MapReduce are necessary to provide good performance. This paper provides an overview of four different scheduling algorithms for MapReduce namely; Scheduling algorithm in Hadoop, Longest Approximate Time to End (LATE) MapReduce scheduling algorithm, Self-Adaptive MapReduce(SAMR) scheduling algorithm and Enhanced Self-Adaptive MapReduce scheduling algorithm(ESAMR). An overview of these techniques is provided through this paper. Advantages and disadvantages of these algorithms are identified.
منابع مشابه
Scheduling and Energy Efficiency Improvement Techniques for Hadoop Map-reduce: State of Art and Directions for Future Research
MapReduce has become ubiquitous for processing large data volume jobs. As the number and variety of jobs to be executed across heterogeneous clusters are increasing, so is the complexity of scheduling them efficiently to meet required objectives of performance. This report presents a survey of some of the MapReduce scheduling algorithms proposed for such complex scenarios. A taxonomy is provide...
متن کاملSurvey on MapReduce and Scheduling Algorithms in Hadoop
We are living in the data world. It is not easy to measure the total volume of data stored electronically. They are in the unit of zettabytes or exabytes referred as Big Data. It can be unstructured, structured or semi structured, they are not convenient to store as well as process with normal data management methods and with machine having limited computational power. Hadoop system is used to ...
متن کاملMapReduce Scheduler: A 360-degree view
Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. Therefore, there are many scheduling algorithms to discuss based on their characteristics. Moreover, there are many shortcoming to discover in this field. In this article, we present the state-of-the-art scheduling alg...
متن کاملAn Investigation on Scheduling Policies for Cloud-based Software Systems
Background: The rapid diffusion of cloud computing technology has been a focus of interest for enterprises due to its higher scalability and availability and greater elasticity. Nevertheless the limited scheduling mechanisms for running applications in the cloud have been a major challenge. Aim: This project introduces an effective scheduling algorithm, which attempts to maximize cloud resource...
متن کاملEvaluating map reduce tasks scheduling algorithms over cloud computing infrastructure
Efficiently scheduling MapReduce tasks is considered as one of the major challenges that face MapReduce frameworks. Many algorithms were introduced to tackle this issue. Most of these algorithms are focusing on the data locality property for tasks scheduling. The data locality may cause less physical resources utilization in non-virtualized clusters and more power consumption. Virtualized clust...
متن کامل